{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# NVIDIA NIMs\n",
    "\n",
    "The `llama-index-llms-nvidia` package contains LlamaIndex integrations building applications with models on \n",
    "NVIDIA NIM inference microservice. NIM supports models across domains like chat, embedding, and re-ranking models \n",
    "from the community as well as NVIDIA. These models are optimized by NVIDIA to deliver the best performance on NVIDIA \n",
    "accelerated infrastructure and deployed as a NIM, an easy-to-use, prebuilt containers that deploy anywhere using a single \n",
    "command on NVIDIA accelerated infrastructure.\n",
    "\n",
    "NVIDIA hosted deployments of NIMs are available to test on the [NVIDIA API catalog](https://build.nvidia.com/). After testing, \n",
    "NIMs can be exported from NVIDIA’s API catalog using the NVIDIA AI Enterprise license and run on-premises or in the cloud, \n",
    "giving enterprises ownership and full control of their IP and AI application.\n",
    "\n",
    "NIMs are packaged as container images on a per model basis and are distributed as NGC container images through the NVIDIA NGC Catalog. \n",
    "At their core, NIMs provide easy, consistent, and familiar APIs for running inference on an AI model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index-core\n",
    "!pip install llama-index-readers-file\n",
    "!pip install llama-index-llms-nvidia\n",
    "!pip install llama-index-embeddings-nvidia\n",
    "!pip install llama-index-postprocessor-nvidia-rerank"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Bring in a test dataset, a PDF about housing construction in San Francisco in 2021."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--2024-05-28 17:42:44--  https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0\n",
      "Resolving www.dropbox.com (www.dropbox.com)... 162.125.1.18, 2620:100:6016:18::a27d:112\n",
      "Connecting to www.dropbox.com (www.dropbox.com)|162.125.1.18|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file# [following]\n",
      "--2024-05-28 17:42:45--  https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline/CTzJ0ZeHC3AFIV3iv1bv9v0oMNXW03OW2waLdeKJNs0X6Tto0MSewm9RZBHwSLhqk4jWFaCmbhMGVXeWa6xPO4mAR4hC3xflJfwgS9Z4lpPUyE4AtlDXpnfsltjEaNeFCSY/file\n",
      "Resolving ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)... 162.125.4.15, 2620:100:6016:15::a27d:10f\n",
      "Connecting to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com (ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com)|162.125.4.15|:443... connected.\n",
      "HTTP request sent, awaiting response... 302 Found\n",
      "Location: /cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file [following]\n",
      "--2024-05-28 17:42:45--  https://ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com/cd/0/inline2/CTySzMwupnuXzKpccOeYJ-7RI0NK0f7XMKBkpicHxSBuuwqAvFly51Fm0oCOwFctgeTqmD3thJsTqfFNOFHNe2JSIkJerj3mMr4Du3C7x1BcSy8t5raSfHQ_qSXF1eHrhdFII8Ou59jbofYVLe0punOl-RIa9k_v722SwkxVbg0KL9MrRL48XjX7JbsYHKTHq-gZSdAmpXpIGqS22eJavcSTuYMIy_GSZtDIs3quHM3PGU4849rG34RjpvAa-XkYDBdE996CxWupZ1C2Red9jEc5Tc6miGgt8-4LbGoxKwKF5I_Q3EqHCbvkibVR8OuKSKPtQZcNJSjsvIImzDLJ2WB6BAp2CBxz8szFF3jF3Gp6Iw/file\n",
      "Reusing existing connection to ucc6b49e945b8d71944c85f4a76d.dl.dropboxusercontent.com:443.\n",
      "HTTP request sent, awaiting response... 200 OK\n",
      "Length: 4808625 (4.6M) [application/pdf]\n",
      "Saving to: ‘data/housing_data.pdf’\n",
      "\n",
      "data/housing_data.p 100%[===================>]   4.58M  8.26MB/s    in 0.6s    \n",
      "\n",
      "2024-05-28 17:42:47 (8.26 MB/s) - ‘data/housing_data.pdf’ saved [4808625/4808625]\n",
      "\n"
     ]
    }
   ],
   "source": [
    "!mkdir data\n",
    "!wget \"https://www.dropbox.com/scl/fi/p33j9112y0ysgwg77fdjz/2021_Housing_Inventory.pdf?rlkey=yyok6bb18s5o31snjd2dxkxz3&dl=0\" -O \"data/housing_data.pdf\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup\n",
    "\n",
    "Import our dependencies and set up our NVIDIA API key from the API catalog, https://build.nvidia.com for the two models we'll use hosted on the catalog (embedding and re-ranking models).\n",
    "\n",
    "**To get started:**\n",
    "\n",
    "1. Create a free account with [NVIDIA](https://build.nvidia.com/), which hosts NVIDIA AI Foundation models.\n",
    "\n",
    "2. Click on your model of choice.\n",
    "\n",
    "3. Under Input select the Python tab, and click `Get API Key`. Then click `Generate Key`.\n",
    "\n",
    "4. Copy and save the generated key as NVIDIA_API_KEY. From there, you should have access to the endpoints."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.core import SimpleDirectoryReader, Settings, VectorStoreIndex\n",
    "from llama_index.embeddings.nvidia import NVIDIAEmbedding\n",
    "from llama_index.llms.nvidia import NVIDIA\n",
    "from llama_index.core.node_parser import SentenceSplitter\n",
    "from llama_index.core import Settings\n",
    "from google.colab import userdata\n",
    "import os\n",
    "\n",
    "os.environ[\"NVIDIA_API_KEY\"] = userdata.get(\"nvidia-api-key\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's use a NVIDIA hosted NIM for the embedding model.\n",
    "\n",
    "NVIDIA's default embeddings only embed the first 512 tokens so we've set our chunk size to 500 to maximize the accuracy of our embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Settings.text_splitter = SentenceSplitter(chunk_size=500)\n",
    "\n",
    "documents = SimpleDirectoryReader(\"./data\").load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We set our embedding model to NVIDIA's default. If a chunk exceeds the number of tokens the model can encode, the default is to throw an error, so we set `truncate=\"END\"` to instead discard tokens that go over the limit (hopefully not many because of our chunk size above)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Settings.embed_model = NVIDIAEmbedding(model=\"NV-Embed-QA\", truncate=\"END\")\n",
    "\n",
    "index = VectorStoreIndex.from_documents(documents)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we've embedded our data and indexed it in memory, we set up our LLM that's self-hosted locally. NIM can be hosted locally using Docker in 5 minutes, following this [NIM quick start guide](https://docs.nvidia.com/nim/large-language-models/latest/getting-started.html).\n",
    "\n",
    "Below, we show how to:\n",
    "- use Meta's open-source `meta/llama3-8b-instruct` model as a local NIM and\n",
    "- `meta/llama3-70b-instruct` as a NIM from the API catalog hosted by NVIDIA.\n",
    "\n",
    "If you are using a local NIM, make sure you change the `base_url` to your deployed NIM URL!\n",
    "\n",
    "We're going to retrieve the top 5 most relevant chunks to answer our question."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# self-hosted NIM: if you want to use a self-hosted NIM uncomment the line below\n",
    "# and comment the line using the API catalog\n",
    "# Settings.llm = NVIDIA(model=\"meta/llama3-8b-instruct\", base_url=\"http://your-nim-host-address:8000/v1\")\n",
    "\n",
    "# api catalog NIM: if you're using a self-hosted NIM comment the line below\n",
    "# and un-comment the line using local NIM above\n",
    "Settings.llm = NVIDIA(model=\"meta/llama3-70b-instruct\")\n",
    "\n",
    "query_engine = index.as_query_engine(similarity_top_k=20)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's ask it a simple question we know is answered in one place in the document (on page 18)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There was a net addition of 4,649 units to the City’s housing stock in 2021.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"How many new housing units were built in San Francisco in 2021?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now let's ask it a more complicated question that requires reading a table (it's on page 41 of the document):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "There is no specific information about the net gain in housing units in the Mission in 2021. The provided data is about the city's overall housing stock and production, but it does not provide a breakdown by neighborhood, including the Mission.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine.query(\n",
    "    \"What was the net gain in housing units in the Mission in 2021?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's no good! This is net new, which isn't the number we wanted. Let's try a more advanced PDF parser, LlamaParse:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-parse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Started parsing the file under job_id 84cb91f7-45ec-4b99-8281-0f4beef6a892\n"
     ]
    }
   ],
   "source": [
    "from llama_parse import LlamaParse\n",
    "\n",
    "# in a notebook, LlamaParse requires this to work\n",
    "import nest_asyncio\n",
    "\n",
    "nest_asyncio.apply()\n",
    "\n",
    "# you can get a key at cloud.llamaindex.ai\n",
    "os.environ[\"LLAMA_CLOUD_API_KEY\"] = userdata.get(\"llama-cloud-key\")\n",
    "\n",
    "# set up parser\n",
    "parser = LlamaParse(\n",
    "    result_type=\"markdown\"  # \"markdown\" and \"text\" are available\n",
    ")\n",
    "\n",
    "# use SimpleDirectoryReader to parse our file\n",
    "file_extractor = {\".pdf\": parser}\n",
    "documents2 = SimpleDirectoryReader(\n",
    "    \"./data\", file_extractor=file_extractor\n",
    ").load_data()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "index2 = VectorStoreIndex.from_documents(documents2)\n",
    "query_engine2 = index2.as_query_engine(similarity_top_k=20)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The net gain in housing units in the Mission in 2021 was 1,305 units.\n"
     ]
    }
   ],
   "source": [
    "response = query_engine2.query(\n",
    "    \"What was the net gain in housing units in the Mission in 2021?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Perfect! With a better parser, the LLM is able to answer the question.\n",
    "\n",
    "Let's now try a trickier question:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Repeat: 110\n"
     ]
    }
   ],
   "source": [
    "response = query_engine2.query(\n",
    "    \"How many affordable housing units were completed in 2021?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The LLM is getting confused; this appears to be the percentage increase in housing units.\n",
    "\n",
    "Let's try giving the LLM more context (40 instead of 20) and then sorting those chunks with a reranker. We'll use NVIDIA's reranker for this:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.postprocessor.nvidia_rerank import NVIDIARerank\n",
    "\n",
    "query_engine3 = index2.as_query_engine(\n",
    "    similarity_top_k=40, node_postprocessors=[NVIDIARerank(top_n=10)]\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1,495\n"
     ]
    }
   ],
   "source": [
    "response = query_engine3.query(\n",
    "    \"How many affordable housing units were completed in 2021?\"\n",
    ")\n",
    "print(response)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Excellent! Now the figure is correct (this is on page 35, in case you're wondering)."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
